Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Multi-dimensional text clustering with user behavior characteristics
LI Wanying, HUANG Ruizhang, DING Zhiyuan, CHEN Yanping, XU Liyang
Journal of Computer Applications    2018, 38 (11): 3127-3131.   DOI: 10.11772/j.issn.1001-9081.2018041357
Abstract912)      PDF (970KB)(484)       Save
Traditional multi-dimensional text clustering generally extracts features from text contents, but seldom considers the interaction information between users and text data, such as likes, forwards, reviews, concerns, references, etc. Moreover, the traditional multi-dimension text clustering mainly integrates linearly multiple spatial dimensions and fails to consider the relationship between attributes in each dimension. In order to effectively use text-related user behavior information, a Multi-dimensional Text Clustering with User Behavior Characteristics (MTCUBC) was proposed. According to the principle that the similarity between texts should be consistent in different spaces, the similarity was adjusted by using the user behavior information as the constraints of the text content clustering, and then the distance between the texts was improved by the metric learning method, so that the clustering effect was improved. Extensive experiments conduct and verify that the proposed MTCUBC model is effective, and the results present obvious advantages in high-dimensional sparse data compared to linearly combined multi-dimensional clustering.
Reference | Related Articles | Metrics
Multi-source text topic mining model based on Dirichlet multinomial allocation model
XU Liyang, HUANG Ruizhang, CHEN Yanping, QIAN Zhisen, LI Wanying
Journal of Computer Applications    2018, 38 (11): 3094-3099.   DOI: 10.11772/j.issn.1001-9081.2018041359
Abstract420)      PDF (1100KB)(461)       Save
With the rapid increase of text data sources, topic mining for multi-source text data becomes the research focus of text mining. Since the traditional topic model is mainly oriented to single-source, there are many limitations to directly apply to multi-source. Therefore, a topic model for multi-source based on Dirichlet Multinomial Allocation model (DMA) was proposed considering the difference between sources of topic word-distribution and the nonparametric clustering quality of DMA, namely MSDMA (Multi-Source Dirichlet Multinomial Allocation). The main contributions of the proposed model are as follows:1) it takes into account the characteristics of each source itself when modeling the topic, and can learn the source-specific word distributions of topic k; 2) it can improve the topic discovery performance of high noise and low information through knowledge sharing; 3) it can automatically learn the number of topics within each source without the need for human pre-given. The experimental results in the simulated data set and two real datasets indicate that the proposed model can extract topic information more effectively and efficiently than the state-of-the-art topic models.
Reference | Related Articles | Metrics